17 research outputs found
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective
Data-centric AI is at the center of a fundamental shift in software
engineering where machine learning becomes the new software, powered by big
data and computing infrastructure. Here software engineering needs to be
re-thought where data becomes a first-class citizen on par with code. One
striking observation is that a significant portion of the machine learning
process is spent on data preparation. Without good data, even the best machine
learning algorithms cannot perform well. As a result, data-centric AI practices
are now becoming mainstream. Unfortunately, many datasets in the real world are
small, dirty, biased, and even poisoned. In this survey, we study the research
landscape for data collection and data quality primarily for deep learning
applications. Data collection is important because there is lesser need for
feature engineering for recent deep learning approaches, but instead more need
for large amounts of data. For data quality, we study data validation,
cleaning, and integration techniques. Even if the data cannot be fully cleaned,
we can still cope with imperfect data during model training using robust model
training techniques. In addition, while bias and fairness have been less
studied in traditional data management research, these issues become essential
topics in modern machine learning applications. We thus study fairness measures
and unfairness mitigation techniques that can be applied before, during, or
after model training. We believe that the data management community is well
poised to solve these problems
Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction
Recently, vision transformers (ViTs) have superseded convolutional neural
networks in numerous applications, including classification, detection, and
segmentation. However, the high computational requirements of ViTs hinder their
widespread implementation. To address this issue, researchers have proposed
efficient hybrid transformer architectures that combine convolutional and
transformer layers with optimized attention computation of linear complexity.
Additionally, post-training quantization has been proposed as a means of
mitigating computational demands. For mobile devices, achieving optimal
acceleration for ViTs necessitates the strategic integration of quantization
techniques and efficient hybrid transformer structures. However, no prior
investigation has applied quantization to efficient hybrid transformers. In
this paper, we discover that applying existing PTQ methods for ViTs to
efficient hybrid transformers leads to a drastic accuracy drop, attributed to
the four following challenges: (i) highly dynamic ranges, (ii) zero-point
overflow, (iii) diverse normalization, and (iv) limited model parameters
(5M). To overcome these challenges, we propose a new post-training
quantization method, which is the first to quantize efficient hybrid ViTs
(MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2)
with a significant margin (an average improvement of 8.32\% for 8-bit and
26.02\% for 6-bit) compared to existing PTQ methods (EasyQuant, FQ-ViT, and
PTQ4ViT). We plan to release our code at \url{https://github.com/Q-HyViT}.Comment: 12 pages, 8 figure
Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy
Data pruning, which aims to downsize a large training set into a small
informative subset, is crucial for reducing the enormous computational costs of
modern deep learning. Though large-scale data collections invariably contain
annotation noise and numerous robust learning methods have been developed, data
pruning for the noise-robust learning scenario has received little attention.
With state-of-the-art Re-labeling methods that self-correct erroneous labels
while training, it is challenging to identify which subset induces the most
accurate re-labeling of erroneous labels in the entire training set. In this
paper, we formalize the problem of data pruning with re-labeling. We first show
that the likelihood of a training example being correctly re-labeled is
proportional to the prediction confidence of its neighborhood in the subset.
Therefore, we propose a novel data pruning algorithm, Prune4Rel, that finds a
subset maximizing the total neighborhood confidence of all training examples,
thereby maximizing the re-labeling accuracy and generalization performance.
Extensive experiments on four real and one synthetic noisy datasets show that
\algname{} outperforms the baselines with Re-labeling models by up to 9.1% as
well as those with a standard model by up to 21.6%
Time Is MattEr: Temporal Self-supervision for Video Transformers
Understanding temporal dynamics of video is an essential aspect of learning
better video representations. Recently, transformer-based architectural designs
have been extensively explored for video tasks due to their capability to
capture long-term dependency of input sequences. However, we found that these
Video Transformers are still biased to learn spatial dynamics rather than
temporal ones, and debiasing the spurious correlation is critical for their
performance. Based on the observations, we design simple yet effective
self-supervised tasks for video models to learn temporal dynamics better.
Specifically, for debiasing the spatial bias, our method learns the temporal
order of video frames as extra self-supervision and enforces the randomly
shuffled frames to have low-confidence outputs. Also, our method learns the
temporal flow direction of video tokens among consecutive frames for enhancing
the correlation toward temporal dynamics. Under various video action
recognition tasks, we demonstrate the effectiveness of our method and its
compatibility with state-of-the-art Video Transformers.Comment: Accepted to ICML 2022. Code is available at
https://github.com/alinlab/temporal-selfsupervisio
ReFine: Re-randomization before Fine-tuning for Cross-domain Few-shot Learning
Cross-domain few-shot learning (CD-FSL), where there are few target samples
under extreme differences between source and target domains, has recently
attracted huge attention. Recent studies on CD-FSL generally focus on transfer
learning based approaches, where a neural network is pre-trained on popular
labeled source domain datasets and then transferred to target domain data.
Although the labeled datasets may provide suitable initial parameters for the
target data, the domain difference between the source and target might hinder
fine-tuning on the target domain. This paper proposes a simple yet powerful
method that re-randomizes the parameters fitted on the source domain before
adapting to the target data. The re-randomization resets source-specific
parameters of the source pre-trained model and thus facilitates fine-tuning on
the target domain, improving few-shot performance.Comment: CIKM 2022 Short; 5 pages, 3 figures, 4 table